Skip to main content

Reply to this post | Go Back
View Post [edit]

Poster: Phoenix_Sandman Date: Jul 17, 2007 4:30am
Forum: web Subject: Robots.txt Abuses Defeating the Archive's Intent

We were considering contributing and promoting the Archive on educational sites we're developing, but as time goes on we are finding the robots.txt policy, though well intended, has way too often, been making the Archive completely useless.

We've noticed many sites being blocked by NEW owners of a domain, which also blocks the previous owner's pages - including pages for products that were honestly bought and paid for previous to the company losing control of it's domain. Many companies are bought by a competitor, or merged into oblivion and some are even closed by Government agencies, on behalf of a some corporation's best interests and against the public - most infuriating!

We've also seen domain parking companies using the robots.txt as a standard policy, blocking access to pages just so they can "advertise" there services to visitors whose interest was earned by the previous holder of the domain, before it is inherited by the site's hosting company.

It seems very unfair that a well intended policy should be able to be abused so easily, and allow the blocking of pages that people have a right to access by way of having paid for them through purchases, expecting the seller to be there, for a while at least.

The robots policy has definately decreased our use of the Archive, having wasted time before due to robots blocking exactly what we wanted.

After having spent over 14 years in the computer business as VAR, consulting and programming, we think the robots.txt was a good idea, whose time has passed. If it had real enforcement under a set of comprehensive rules insuring fair use, and so on, it may have been a different story. But it does not.

Reply to this post
Reply [edit]

Poster: K9ine Date: Aug 23, 2007 3:16pm
Forum: web Subject: Re: Robots.txt Abuses Defeating the Archive's Intent

I totally agree that this is a problem when domains expire abd needs to be addressed before more and more of the archive is lost. I do not know if the archive deletes the pre robots.txt data or just marks it blocked.

Reply to this post
Reply [edit]

Poster: Telephone Toughguy Date: Jul 17, 2007 8:22am
Forum: web Subject: Re: Robots.txt Abuses Defeating the Archive's Intent

Should we shred all the old newspapers and take magnets to any cassette recordings off of the radio that have commericals in between them as well? How about those old VCR recordings of TV with commercials no longer on the air? That is my take of the opting out function with the text robots. Why would you not want your page archived for historical purposes?... makes no sense. Just another way for big corporate interests to control and censor human expression and mandate history. I rarely use the wayback machine but I occasionally like to look at old webpages of days gone buy for helping to site references and stuff, it would be a shame if something was gone because of the scenario you mention.